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Abstract 

TASCAR is a toolbox for creation and rendering of 
dynamic acoustic scenes that allows direct user in¬ 
teraction and was developed for application in hear¬ 
ing aid research. This paper describes the simula¬ 
tion methods and shows two research applications 
in combination with motion tracking as an example. 
The first study investigated to what extent individ¬ 
ual head movement strategies can be found in dif¬ 
ferent listening tasks. The second study investigated 
the effect of presentation of dynamic acoustic cues 
on the postural stability of the listeners. 
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1 Introduction 

Hearing aids are evolving from simple ampli¬ 
fiers to complex processing devices. Algo¬ 
rithms in hearing devices, e.g., directional mi¬ 
crophones, direction of arrival estimators, or 
binaural noise reduction, depend on the spatial 
properties of the surrounding acoustic environ¬ 
ment [Hamacher et ah, 2005]. Several studies 
show a large performance gap between labora¬ 
tory measurements and real life experience, at¬ 
tributed to a changed user behavior [Smeds et 
ah, 2006] as well as oversimplification of the test 
environment [Cord et ah, 2004; Bentler, 2005]. 
To bridge this gap, a reproduction of complex 
listening environments in the laboratory is de¬ 
sired. To allow for a systematic evaluation of 
hearing device performance, these virtual acous¬ 
tic environments need to be scalable and repro¬ 
ducible. There are several requirements for a 
virtual acoustic environment to make it suitable 
for hearing research. For human listening a high 
plausibility of the environments and a reproduc¬ 
tion of the relevant perceptual cues is required. 
For machine listening and processing in multi¬ 
microphone hearing devices, a correct reproduc¬ 
tion of relevant physical properties is needed. 
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For an ecologically valid evaluation of hearing 
devices, the virtual acoustic environments need 
to reflect relevant every-day scenarios. Addi¬ 
tionally, to assess limitations of hearing devices, 
realistic but challenging environments are re¬ 
quired. In both cases, the reproduction need 
to allow for listener movements in the environ¬ 
ment and may contain moving sources. 

Existing virtual acoustic environment engines 
often target authentic simulations for room 
acoustics (e.g., EASE, ODEON), resulting in 
a large complexity. They typically render im¬ 
pulse responses for off-line analysis or auraliza- 
tion. Other tools, e.g., the SoundScapeRen- 
derer [Ahrens et ah, 2008], do not provide all 
features required here, such as room simulation 
and diffuse source handling. Therefore, a tool¬ 
box for acoustic scene creation and rendering 
(TASCAR) was developed as a Linux audio ap¬ 
plication. The aim of TASCAR is to interac¬ 
tively reproduce time varying complex listening 
environments via loudspeakers or headphones. 
For a seamless integration into existing mea¬ 
surement tools of psycho-acoustics and audiol¬ 
ogy, low-delay real-time processing of external 
audio streams in the time domain is applied, 
and interactive modification of the geometry is 
possible. TASCAR consists of a standalone ap¬ 
plication for the acoustic simulation, and a set 
of command line programs and Octave/Matlab 
scripts for recording from and playing to jack 
ports, and measuring impulse responses. 

The simulation methods and implementation 
are described in section 2. Two research appli¬ 
cations of TASCAR in combination with mo¬ 
tion tracking are shown as an example. The 
first study (section 3.1) investigates to what ex¬ 
tent individual head movement strategies can 
be found in different listening tasks. Results in¬ 
dicate that individual strategies exist in natural 
listening tasks, but task specific behavior can be 
found in tasks which include localization. The 
second study (section 3.2) investigates the effect 



of presentation of dynamic acoustic cues on the 
postural stability of the listeners. Test subjects 
performed a stepping test while imposed with 
stationary or spatially dynamic sounds. Results 
show that in the absence of visual cues the spa¬ 
tial dynamics of acoustic stimuli have a signifi¬ 
cant effect on postural stability. 

2 TASCAR: Methods and 
implementation 

The implementation of TASCAR utilizes the 
jack audio connection kit [Davis and Hohn, 
2003]. Audio content is exchanged between dif¬ 
ferent components of TASCAR via jack ports. 
The jack time line is used as a base of all time- 
varying features. Audio signals are processed 
block-wise in the time domain. A rough sig¬ 
nal and data flow chart of TASCAR is shown in 
Figure 1. 
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Figure 1: Schematic audio and control signal 
flow chart of TASCAR in a typical hearing re¬ 
search subjective test application. 

The structure of TASCAR can be divided into 
three major components: Audio content is de¬ 
livered by an audio player module. It provides 
a non-blocking method of accessing sound file 
portions. Audio content can also be delivered by 
external sources, e.g., from physical sources, au¬ 
dio workstations, or any other jack client. The 
second major block is the geometry processing 
of the virtual acoustic environment. The last 
block is the acoustic model, i.e., the combina¬ 
tion of audio content and geometry information 
into an acoustic environment in a given render 
format. 


2.1 Geometry processing 

An acoustic scene in TASCAR consists of ob¬ 
jects of several types: Sound sources, receivers, 
reflectors, and dedicated sound portals for cou¬ 
pled room simulations [Grimm et ah, 2014]. All 
object types have trajectories defined by loca¬ 
tion in Cartesian coordinates and orientation on 
ZYX-Euler-coordinates. These trajectories are 
linearly interpolated between sparse time sam¬ 
ples; the location is interpolated either in Carte¬ 
sian coordinates, or in spherical coordinates rel¬ 
ative to the origin. The orientation is interpo¬ 
lated in Euler coordinates. The geometry is up¬ 
dated once in each processing cycle. 

Sound source objects can consist of multiple 
“sound vertices”, either as vertices of a rigid 
body, i.e., following the orientation of the ob¬ 
ject, or as a chain, i.e., at a given distance on 
the trajectory. Each “sound vertex” is a pri¬ 
mary source. 

p src is the primary source position, p rec is 
the receiver position, and O rec is the rota¬ 
tion matrix of the receiver. Then p re i = 
OjT e c (psrc — Prec) is the position of the sound 
source relative to the receiver, and r = ||Prez|| 
is the distance between source and receiver. 

Reflectors can consist of polygon meshes with 
one or more faces. For each mesh, reflection 
properties can be defined. For a first order im¬ 
age source model, each pair of primary source 
and reflector face creates an image source. For 
higher order image source models, also the im¬ 
age sources of lower orders are taken into ac¬ 
count. A schematic sketch of the image model 
geometry is shown in Figure 2. The image 
source position Pi mg is determined by the clos¬ 
est point on the (infinite) reflector plane p cu t to 
the Source Pst*c* lpimg — 2p C ut Psrc* 

The image source position is independent of 
the receiver position. However, the visibility of 
an image source depends on the receiver posi¬ 
tion and the reflector dimension. If the inter¬ 
section point of the connection from the image 
source to the receiver with the reflector plane 
Pi s is within the reflector boundaries, the im¬ 
age source is visible, and a specular reflection 
is applied. If pi s is not within the reflector 
boundaries, the effective image source position 
is shifted into the direction of the closest point 
on the boundary to pi s , and an “edge reflec¬ 
tion” is applied. The differences between these 
two reflection types in terms of audio processing 
are described in section 2.2.2. 




























Figure 2: Schematic sketch of the image model geometry. Left panel: “specular” reflection, i.e., 
the image source is visible within the reflector; right panel: edge reflection. 


2.2 Acoustic model 

For each pair of receiver and sound source - 
primary or image source - an acoustic model is 
calculated. The acoustic model can be split into 
the transmission model, which depends only on 
the distance between source and receiver, an 
image source model, which depends on the re¬ 
flection properties of the reflecting surfaces as 
well as on the “visibility” of the reflected image 
source, and a receiver model, which encodes the 
direction of the sound source relative to the re¬ 
ceiver into the render output format. 

2.2.1 Transmission model 

The transmission model consists of air absorp¬ 
tion, and a time-varying delay line for a simu¬ 
lation of Doppler-shift and time-varying comb¬ 
filter effects. 

Point sources follow a 1/r sound pressure law, 
i.e., doubling the distance r results in half of the 
sound pressure. Air absorption is approximated 
by a simple first order low-pass filter model with 
the filter coefficients controlled by the distance: 

Uk = aiyk-i + (1 — a\)xk (1) 

rfs 

ai = e » , (2) 

where c is the speed of sound, x k is the source 
signal at the sample k, and y k is the filtered 
signal. The empiric constant a = 7782 was 
manually adjusted to provide sensible values for 
distances below 50 meters. This approach is 


very similar to that of [Huopaniemi et al., 1997] 
who used a FIR filter to model the frequency 
response at certain distances. However, in this 
approach the distance parameter r can be var¬ 
ied dynamically. 

The time varying delay line uses nearest 
neighbor interpolation 1 . 

2.2.2 Image source model 

Early reflections are modeled using an image 
source model. In opposite to most commonly 
used models (e.g., [Allen and Berkley, 1979]) 
which calculate impulse responses for a rectan¬ 
gular enclosure (“shoebox model”), reflections 
are simulated for each reflecting polygon-shaped 
surface. 

With finite reflectors, it is distinguished be¬ 
tween a “specular” reflection, when the im¬ 
age source is visible from the receiver position 
within the reflector, and an “edge” reflection, 
when the image source would not be “visible”. 
In both cases, the source signal is filtered with 
a first order low pass filter 2 determined by a re¬ 
flectivity coefficient p, and a damping coefficient 
6 : 

y k = Sy k -1 + px k (3) 

For “edge” reflections, the effective image 
source is shifted that it appears from the di- 

1 Other interpolation methods are planned. 

2 In later versions of TASCAR the reflection filter will 
be controlled by frequency-dependent absorption coeffi¬ 
cients to avoid the sample rate dependency. 




rection of a point on the reflector edge which 
is closest to pi s . If receiver or sound source are 
behind the reflector, the image source is not ren¬ 
dered. 

2.2.3 Receiver model 

A receiver encodes the output of the transmis¬ 
sion model of each sound source into the output 
format, based on the relative position between 
sound source and receiver, p k,rel- Each receiver 
owns one jack output port for each output chan¬ 
nel n; the number of channels depends on the 
receiver type and configuration. The receiver 
output signal Zk{n) for the output channel n 
and sound source k is 

z k (n) = w(p kt rei,n)y k (4) 

with the transmission model output signal y 
w(p re i,n) are the driving weights for each out¬ 
put channel. The mixed output signal of the 
whole virtual acoustic environment is the sum of 
Zk(n) across all sources k, plus the diffuse sound 
signals decoded for the respective receiver type 
(see section 2.3 for more details). 

Several receiver types are implemented: Vir¬ 
tual omni-directional microphones simply re¬ 
turn the output without directional processing, 
w = 1. Simple virtual cardioid microphones ap¬ 
ply a gain g depending on the angle 6 between 
source and receiver: 

w = \ (cos(0) + 1) (5) 

For reproduction via multichannel loudspeaker 
arrays, receiver types with one output channel 
for each loudspeaker can be used. A “near¬ 
est speaker” receiver is a set of virtual loud¬ 
speakers at given positions in space (typically 
matched with the physical loudspeaker setup). 
The driving weights for each virtual loudspeaker 
are 1 for the least angular distance between 
the virtual loudspeaker and p re /, and 0 for all 
other channels. Other receiver types are hori¬ 
zontal and full-periphonic 3rd order Ambison- 
ics [Daniel, 2001], VBAP [Pulkki, 1997], and 
“basic” as well as “in-phase” ambisonic panning 
[Neukom, 2007]. 

Since the geometry is updated only once in 
each processing block, all receiver types inter¬ 
polate their driving weights so that the pro¬ 
cessed geometry is matched at the end of each 
block. For some receiver types, e.g., 3rd order 
Ambisonics, this may lead to a spatial blurring 
of the sources if the angular movement within 


one processing block is large compared to the 
spatial resolution of the receiver type. 

2.3 Diffuse sources and reverberation 

Diffuse sources, e.g., background signals, or 
diffuse reverberation [Wendt et ah, 2014], are 
added in first order ambisonics (FOA) format. 
No distance law is applied to diffuse sound 
sources; instead, they have a rectangular spa¬ 
tial range box, i.e., they are only rendered if 
the receiver is within their range box, with a 
von-Hann ramp at the boundaries of the range 
box. Position and orientation of the range box 
can vary with time. The diffuse source signal is 
rotated by the difference between receiver orien¬ 
tation and box orientation. Each receiver type 
provides also a method to render FOA signals 
to the receiver-specific output format. 

2.4 Further components 

Besides the open source core of TASCAR in 
form of a command line application 3 , a set 
of extension modules is commercially devel¬ 
oped by HorTech gGmbH. These components 
include a graphical user interface, a time aligned 
data logging system for open sound control 
(OSC) messages, interfaces for motion trackers 
and electro-oculography, and specialized con¬ 
tent controllers. 

3 Example research applications 

In this section, two studies related to hearing 
aid research which are based on TASCAR are 
briefly described, to illustrate possible applica¬ 
tions. 

3.1 Individualized head motion 
strategies 

The hypothesis of this study was that task- 
specific head movement strategies can be mea¬ 
sured on an individual basis. Head movements 
in a natural listening environment were as¬ 
sessed. A panel discussion with four talkers 
in a simulated room with early reflections was 
played back via an eight-channel loudspeaker 
array, using 3rd order Ambisonics. Head move¬ 
ments were recorded with the time aligned data 
logger using a wireless inertial measurement 
unit and a converter to OSC messages. 

Figure 3 shows five individual head orien¬ 
tation trajectories. Systematic differences can 
be observed: Whereas one subject (green line) 
performs a searching motion, i.e., modulation 

3 https: //github.com/gisogrimm/tascar 
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Figure 3: Intensity of a panel discussion in a room as a function of time and azimuth (shades of 
gray) with five individual head orientations. 


around the final position, at each change of talk¬ 
ers, other subjects adapt slower to source posi¬ 
tion changes. One subject (blue line) shows a 
constant offset, possibly indicating a better-ear 
listening strategy. 

3.2 Postural stability 

Some hearing aid users feel disturbed by fast¬ 
acting automatics of hearing aids and the poten¬ 
tially resulting quickly changing binaural cues. 
To prepare the ground for further investigations 
of this effect, the second study assessed the ef¬ 
fect of spatially dynamic acoustic cues on the 
postural stability [Busing et ah, 2015]. It is 
based on an experiment in which it was shown 
that the presence of a stationary sound can im¬ 
prove the postural stability in the absence of 
visual cues [Zhong and Yost, 2013]. A Fukuda 
stepping test was performed, in which the sub¬ 
jects were asked to step 100 steps in a fixed po¬ 
sition. The subject drift was taken as a measure 
of postural stability. 

In this study with 10 young participants with 
normal vision and hearing, the factors vision 
(open or closed eyes), stimulus (static or mov¬ 
ing) and spatial complexity (two sources or many 
sources) on postural stability were analyzed. 
The stimuli were rendered with TASCAR; the 
factors stimulus and spatial complexity were re¬ 
alized by alternative virtual environments. The 
environment with low complexity was a kitchen 
scene with a frying pan and a clock, either 
rendered statically or with a sinusoidal rotate 


around the listener. The complex environment 
was a virtual amusement park, either from a 
carousel perspective or from a static position. 
The subjects were tracked with the microsoft 
kinect skeleton tracking library. The positions 
of the modeled nodes were send from the win¬ 
dows PC via OSC to the TASCAR data logger. 
The body rotation was measured as the rota¬ 
tion of the shoulder skeleton nodes. The results 
are shown in Figure 4. Vision has the largest 
effect on the body rotation; with open eyes the 
average body rotation during the test is small, 
independent from the stimulus and complexity 
condition. However, without visual cues, the 
spatially dynamic complex scene leads to a sig¬ 
nificantly higher body rotation than the corre¬ 
sponding complex static scene. 

4 Conclusions 

To bridge the gap between laboratory results 
and real-life experience in the domain of hear¬ 
ing research and hearing device evaluation, a 
tool for acoustic scene creation and rendering 
(TASCAR) was developed. The tool focuses on 
a reproduction of perceptual cues and physical 
properties of the sound field which are relevant 
for typical applications in hearing device re¬ 
search. Simplifications allow for computational 
efficiency. The implementation utilizes the jack 
audio connection kit, resulting in a large flexi¬ 
bility. 

To compute the sound at a given position of 
the receiver, the signal coming from each source 
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Figure 4: Body rotation in a Fukuda stepping test in a simple scene (left panel) and a complex scene 
(right panel). In the absence of visual cues, the dynamic cues (red diamonds) have a significant 
effect on the body rotation in the complex scene. 


- primary or image source - is computed based 
on the transmission model, i.e., depending on 
the distance. The receiver output signal is com¬ 
puted depending on the type of the receiver and 
the angle between source and receiver. The re¬ 
ceiver signals from all sources are added up and 
combined with diffuse sounds, resulting in the 
sound of a virtual acoustic environment in a 
given point. 

Two studies based on the spatial audio re¬ 
production of TASCAR demonstrate its appli¬ 
cability as a research tool for reproduction of 
spatially dynamic acoustic environments. 
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